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Abstract 

The dual problem of testing the predictive significance of a particular covariate, and identification of the 
set of relevant covariates is common in applied research and methodological investigations. To study this 
problem in the context of functional linear regression models with predictor variables observed over a grid 
and a scalar response, we consider basis expansions of the functional covariates and apply the likelihood 
ratio test. Based on p-values from testing each predictor, we propose a new variable selection method, 
which is consistent in selecting the relevant predictors from set of available predictors that is allowed 
to grow with the sample size n. Numerical simulations suggest that the proposed variable selection 
procedure outperforms existing methods found in the literature. A real dataset from weather stations in 
Japan is analyzed. 
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1. Introduction 

In regression analysis, selecting the relevant set of predictors is a fundamental step for building 
a good predictive model. Including insignificant predictors results in over-complicated models with less 
predictive power and reduced ability to discern and interpret the influence of each variable. However, 
classical selection methods have to be adapted to the high-dimensional data sets which are becoming 
increasingly common in several areas of research. 

When the data is observed at several time (or space) points, simple linear regression models cannot be 
directly used. Functional regression models (FRM) express the discrete observations of the predictor as a 
smooth function, and inference can then be made about a response variable based on the functional data 
(Ramsay and Silverman, 2005). Such models have become increasingly useful due to their large number 
of applications, see Kokozsca and Horvath (2012) for some fundamental results and Ferraty and Vieu 
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(2006) for a nonparametric approach. This high demand has recently leveraged important theoretical 
advances, see for example James (2002), Ferraty and Vieu (2009), James, Wang and Zhu (2009), Ferraty, 
Laksaci, Tadj and Vieu (2010), and Aneiros and Vieu (2013), Goia and Vieu (2014), to cite a few. 

However, only a few authors have considered variable selection in functional regression analysis. 
Aneiros and Vieu (2014) show how to perform variable selection using the continuous structure of the 
functional predictors by studying which of the discrete observed points should be incorporated. Using 
a partial linear model for multi-functional data, Aneiros and Vieu (2015) propose a variable selection 
method based on the continuous specificity of the functional data. Cuevas (2014, Section 5) presents an 
interesting overview of recent methods for functional data analysis including functional regression. Most 
recent contributions in regression for these models can be found in Bongiorno et al. (2014). Another class 
of such methods uses regularization techniques, where the penalty simultaneously shrinks parameters and 
selects variables. Matsui and Konishi (2011) studied the group SCAD regularization for estimating and 
selecting functional regressors while Mingotti, Lillo and Romo (2013) and Hong and Lian (2011) gener¬ 
alized the Lasso for the case of scalar regressors and a functional response. Other recent contributions 
to the variable selection problem in functional models are Fan and Li (2004), Aneiros, Ferraty, and Vieu 
(2011), Gertheiss, Maity, and Staicu (2013) and Ma, Song and Wang (2013). 

In this paper, we propose a different approach, exploiting the conceptual connection between model 
testing and variable selection: dropping a covariate from the model is equivalent to not rejecting the 
null hypothesis that its corresponding parameter(s) is equal to zero. Abramovich, Benjamini, Donoho 
and Johnstone (2006) showed that the application of a false discovery rate (FDR) controlling procedure, 
such as Benjamini and Yekutieli (2001), on p-values resulting from testing each null hypothesis can be 
translated into minimizing a model selection criterion. The extension and adaptation of the theory of 
hypothesis testing to functional models have been studied by several authors in the literature (Cardot, 
Goia, and Sarda, 2004, Yang and Nie, 2008, Swihart, Goldsmith and Crainiceanu, 2013, Kong, Staicu 
and Maity, 2013, McLean, Hooker and Ruppert, 2014, Pomann, Staicu and Ghosh, 2014). An interesting 
application can be found in Meinshausen, Meier and Buhlmann (2009), with results on the connection 
between p-values and variable selection in regression analysis. 

The main objective of this paper is twofold: study the asymptotic properties of the hypothesis test 
based on residual sum of squares for the relevance of a predictor in a multivariate functional regression 
model; and propose a competitive variable selection procedure based on FDR (or Bonferroni) corrections 
applied on the p-values from the tests of each available functional predictor. The proposed test statistic 
is a likelihood ratio type test, where restricted and full models are estimated through the B-Splines basis 
expansions of both coefficients and functional predictors. We examine the shift (non-centrality parameter) 
of the distribution of the test statistic under the alternative hypothesis, which provides insight into the 
power of the test and induce the demonstration of consistency of the variable selection procedure. 

The remainder of this paper is as follows. In Section 2, we formally describe the regression model 
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with functional covariates and scalar response via basis expansions. In Section 3, we present the testing 
procedure and the variable selection method. In Section 4 we evaluate the finite sample performance of 
the proposed variable selection through simulation examples and a real application with weather data is 
considered in Section 5. 


2. The functional regression model: FRM 


Suppose that we have n observations {(yi, ajj(t)) : t £ 1~, i = 1,n}, where yi is a scalar response, 
iCj(t) = (xn(ti), ..., Xim^m)) are functional predictors and T = 7i x ... x Tm- Each T m , m = 1,..., M, 
is a compact set in R where the m-th predictor may be observed. The functional predictors x m ,m = 
1 ,,M are assumed to be in a fixed design so that in practice t m £ T m is a grid representing time or 
space. Suppose that each of the M functional predictors can be expressed as: 

Pm 

Xirn (tm) — ^ ( Ldimj 'f'mj (tm) — XVim Am (tm); Ul — 1, . . . , At, t m £ 7m, (1) 

1=1 


where W im = (w im 1 , are the vectors of coefficients and 4> m (t m ) = (4>mi(t m ), </>m Pm (tm)) T 

are vectors of B-Splines basis functions. The basis functions and the p m coefficients in dU are assumed 
to be determined prior to the regression modeling through smoothing methods. In general this finite 
B-splines representation of a functional predictor is a good approximation of smooth functions, such as 
functions in the Sobolev Space (see Reif, 1997). 

We consider the functional regression model (Ramsay and Silverman, 2005) given by 

M 


yi — A) ^ ) / Xim [tm)(3rn (tm')dt rrL 6i , (2) 

m=l J r m 

where is a constant, i = 1,..., n are i.i.d. Gaussian noises with mean 0 and constant variance a 2 , 
and f3m(t m ) are functional coefficients that we assume can be represented through the basis expansion 


Pm 

/^m(^m) — ^ ( bmj&mj (^m) — b m Am(^ m )> m — 1) • • • i Af, ^m £ Tm, (3) 

j =1 

for the parameter vectors b m = (b m i, ..., b mptn ) T . Thus the FRM in |2]) can be re-expressed as a linear 
model in the following way 

M r M r 

Vi = A) + 'y ) / XVi m (f) rn (t rn )(t) rn (t rn )b rn dt rn + £i = ( 3 q + y ) XV im / 4 > m{tm) , firn(tm)dt rn b rn + £j 

m=1 •'Tm. m=l •''bn 

M 

= fo+Y. W *r n J^b m +e i = Z i T b + e i , 

m —1 


or in matrix form Y = Zb + e, where Z, = (1, J^,..., Wf M J</, M ) T , b = (/3 0 , hf, &m) T , Z = 

(Zi,..., Z n ) T , ^) m (t m )<Am(^rn)^rn are x cross product matrices and e is the vector 

of error terms. Since we adopt B-splines basis expansions, the cross product matrix can be easily 
computed using the procedure in Kayano and Konishi (2009). 
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3. Methodology 


3.1. Testing procedure 

In this section we address the problem of testing the relevance of an individual functional predictor 
in the multivariate FRM. We consider testing the r-th (r € {1,..., M}) predictor through the following 
null hypothesis 

H r 0 : b r = 0 vs H r a : b r ± 0. (4) 

In linear models with normal errors, least squares estimates, which minimize the residual sum of squares, 
are equivalent to maximum likelihood estimates. For ease of notation, in this section, we omit from all 
statistics the index r that identifies the predictor being tested. Let ( and ft denote the spaces generated 
by the predictors under H 0 and H a respectively. Note that £ C and hence rank(fi) = 1 +5Zm=i Pm '■= k 
and rank(C) = k — p r = 1 + Y^m-iPm ~ Pr := &o- We assume throughout this paper that the matrix Z 
has full rank, that is, Z has k < n linearly independent columns (see also condition (Cl) in Section EPll . 
This assumption guarantees the existence and uniqueness of the least squares estimators. Let RSSq and 
RSS denote the residual sum of squares under Hq and H a respectively, that is, 


RSSn = 


n 2 n 

E (v* ~ Z T 6°) and RSS = £ ( W - 2 T‘ 


(5) 


2=1 

rTry,-l AT) -l 


A b for a p r x k matrix A defining the null hypothesis, i.e., 


where b = b- (Z T Z)" 1 A T (A(Z T Z) 

Afo = 0 implies b r = 0. 

For insight into the distribution of the test statistic and the non-centrality parameter presented below, 
it is useful to express the sum of squares RSSq and RSS as a quadratic form. We write Y 0 = Zb° = PoY 
and Y = Zb = PY, where Po and P are the orthogonal projection matrices which project Y onto the 
spaces ( and f2, respectively. We can then rewrite the residual sum of squares as RSSq = Y T (I„ — Po)Y 
and RSS = Y T (I„ - P)Y, so that RSS 0 - RSS = Y T (P - P„)Y. Since 


RSS, 


0 H 0 


xl- ko and 


RSS 


Ho 


& u G 

in order to test Hq in (jT]) we use the likelihood ratio statistic 


2 Ln 

Lo 

= -2 


L 



RSS » 


1 

2d2 


RSS 


v 2 

/ y n — k ’ 


RSS 0 — RSS h. 




Xk-k 0 


( 6 ) 


in distribution, with cr 2 = RSS/n A tr 2 the maximum likelihood ratio statistic. From the Normality 
assumption of the residuals and the fact that 


1 


1 


E [RSS 0 - RSS] = — [a 2 Tr(P - P„) + (Zb) T (P - P 0 )Zb] = (k - k 0 ) + <5 = Pr + 5, 


where 


S = WZ 1 (P-P 0 )Zb/c 


(7) 


the following proposition can be established. 
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Proposition 3.1. (Theorem 5.3c in Rencher and Schaalje, 2008) Let RSS and RSSq be defined as in 
KW- Then, under the alternative hypothesis in 0 


RSS, 


0 H a 


X 2 n - ko (S ) and 


RSS 


H a 


Xn- 


n — k' 


that 


RSSq - RSS 


H a 


Xk-ko (^)- 


Lemma [Uni specifies the order of the non-centrality parameter of the distribution of (RSSq —RSS)/a 2 . 
Growing at the order of the sample size, multiplied by the significance size of the parameter being tested, 
the shift produced by the non-centrality parameter under H a provides evidence for rejecting the null 
hypothesis. Using this result, Theorem 13.51 shows the consistency of the proposed variable selection 
procedure, which is described in Section [3.21 


Lemma 3.2. Let T£ be the likelihood ratio test statistic defined in 0 for testing Hq in 0- For the 
alternative hypothesis, the non-centrality parameter 8 defined in 0 is of order <5 ~ c(n — ko), for a 
constant c. 


3.2. Consistent test based variable selection 

In this section we describe a test-based variable selection method which is shown to consistently 
identify the set of relevant predictors. A similar procedure was used by Bunea, Wegkamp and Auguste 
(2006) in the linear model setting, and by Zambom and Akritas, (2014) for a nonparametric model. 

Let Im = {1, ...,M} denote the set of indices of the M available functional predictors. Assume that 
the true underlying model is sparse in the sense that only a few predictors significantly relate to the 
response variable, while M is allowed to grow with n at a rate such that the following condition holds 

M 

Condition (Cl) : k = 1 + ^ p m < y/n/log(n). 

m= 1 

Let Lq = {toi,...,tom 0 } denote the (unknown) subset of indices corresponding to the Mo significant 
predictors. The objective of the proposed variable selection method is to identify the subset Jo, that is, 
to determine the set of functional variables with predictive significance. 

Let T£, r = 1, ...,M, denote the likelihood test statistic defined in © for testing Hq in (U) and 


7Tr = l- *(T£) 


( 8 ) 


the corresponding p-value, where ’!'(.) is the cumulative function of the Xp distribution. The Bonferroni 
method yields I = {m : ir m < q/M} as the estimate of Iq. The false discovery rate (FDR) procedure 
(Benjamini and Yekutieli, 2001) computes 


s — max 


; n U) 


< 


j q 

m i _1 


(9) 


where 7T(!) < ... < tt^m) denote the ordered p-values and q is the choice of level, and rejects Hq , 
j = 1,..., s. If no such s exists, no hypothesis is rejected. The proposed variable selection method selects 
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the predictors with indices corresponding to the s rejected null hypotheses. Hence, Iq is estimated by the 
set I of indices corresponding to the first s ordered p-values. 

Let us now prove the consistency of the proposed variable selection method. Let R denote the total 
number of rejected hypothesis, so we have that R = sl(s in © exists), where 1(.) is the indicator 
function. Now, let V be the number of falsely rejected hypotheses, and set Q = (V/R)t(R > 0) for the 
proportion of falsely rejected hypotheses. By definition, the FDR is E(Q), and E(Q) < q(M — M 0 )/M < 
q , (Benjamini and Yekutieli, 2001). We consider consistent a procedure, and the estimated set I, if 
P(I = Iq ) —> 1 as n —> oo. Theorem 13.51 in connection with Lemmas 13.21 - l3~4l show the consistency of I. 

Lemma 3.3. Let T£ and ir r = 1 — \1/(T£) be the test statistic and the p-value defined as in © and © 
for testing Hq. Assume condition (Cl) holds and define A n = {|ct — oj < \Jlog(n)/n}. 

(a) For r (f Iq and any 0 < 7 < 1, we have P ({ 77 , < 7 } n A n ) = 7 + 0(^/log(n)/n). 

(b) For r £ Iq and 0 < 7 < 1 , as n —► 00, if j > 1 /n, we have 
P ({7 T r > 7} n A n ) = 0(7) + 0 (yJlog(n)/n). 

Lemma 3.4. Let F n be the event where the smallest Mq p-values defined in m are the p-values corre¬ 
sponding to the Mq significant functional predictors, with Iq = {mi,..., mM 0 }, that is 

r n = [{7T(1) , ..., 1t(M 0 ) } = {v ) •••) }] ■ 

Then, if condition (Cl) holds, lim P (r„) = 1. 

n —¥ 00 

Theorem 3.5. Let S be the non-centrality parameter defined in and q the chosen bound of FDR in 
m or in Bonferroni corrections. Assume that condition (Cl) holds and q —> 0 asn-> 00 , in such a way 
that q > M ^ _1 ) /(-^o n ) and Mq/log(M) 0. Then, lim P = Iq ^ = 1. 

Note that the choice of q —> 0 is important for the consistency of the proposed method. For real 
datasets, a rule of thumb is to choose q = 0(l/M) if M is large relatively to the sample size n, otherwise 
choose q = 0(1/y/n). These choices guarantee the consistency of the variable selection while satisfying 
all assumptions and conditions. In the simulation study we explore different choices of this parameter. 

4. Numerical simulations 

Simulation studies were conducted to evaluate the finite sample performance of the proposed 
variable selection procedure. The Monte Carlo simulations in this section are based on 100 and 300 gen¬ 
erated observations of six functional covariates and a scalar response {(xi m (t), y/)] t £ r m , i = 1,..., n,m = 
1,..., 6}, extending the simulation set up in Matsui and Konishi (2011) by including three extra functional 
predictors. We compared the performance of the proposed variable selection procedure with that of group 
SCAD and group LASSO proposed by Matsui and Konishi (2011), and the Generalized Functional Linear 
Model (GFLM) method in Gertheiss, et al. (2013) with adaptive penalization. For comparison purposes, 
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we used 6 basis functions for the estimation of the predictors and the functional parameters /?(.) in all 
methods. First, we generated Zi m corresponding to the predictor X m in an equally spaced grid of 50 
points in T m in the following way: 

Zim ~ T ^imi ^im ^ -N(0, (0.025T' Xim ) ), 


where r Xim = maxi(u im (£ m )) - min(u im (i m )) and 

un{t) = cos(2ir(t - ai)) + a 2 , 71 = [0,1], ai ~ N(— 4,3 2 ), a 2 ~ -/V(7,1.5 2 ), 

ua{t) = bisin{wt) + b 2 , 7a = [0,7r/3], &i~E/( 3, 7), b 2 ~ 1V(0,1), 

u^it) = cit 3 + c 2 t 2 + c 3 t, 75 = [-1,1], ci ~1V(-3,1.2 2 ), c 2 ~ At(2,0.5 2 ),c 3 ~7V(-2,l), 
Ui 4 ,(t) = sin(2(t — di)) + d 2 t, 71 = [0, 7t/3], di~7V(—2,1), d 2 ~ iV(3,1.5 2 ), 

Uis(£) = eicos(2f) + e 2 t, 7s = [-2,1], ei~ £7(2,7), e 2 ~ 7V(2, 0.4 2 ), 


^ 6 (t) = /ie- t/3 + / 2 t + /3, 75 = [-1,1], /i~7V(4,2 2 ), / 2 ~ 7V(—3,0.5 2 ), / 3 ~ 7V(1,1). 


The scalar response j/,; was generated as yi = 
N{ 0, (0.05i? yi ) 2 ) and i? yi = max(g(iij)) — min 


6 f 

g{\ii) + £j, where g(uj) = V] / u im (t)P m {t)dt, £* ~ 

m =i J r m 

(g(\ii)). For a constant c = 0, 0.4 and 0.8, the coefficient 


functions /3 m (£) are given by 


/3i(£) = sin(t), (3 2 (t) = sin{2t), /3 3 (£) = — c£ 2 , /^(t) = sin{2t), fait) = csiniirt), fait) = 0. 


Note that if c = 0 the true model specifies that only ui,u 2 and 114 significantly relate to the response, 
corresponding to the predictors Xi,X 2 and X 4 . 

As the first step of our analysis, the random data Zi m was converted into the functional data Xi m 
using B-splines basis smoothing. For these data, we assumed the functional regression model 

6 

Vi — ^ ^ / Xim it) fim it)dt T £^, 

m = l JTm 


and applied the proposed variable selection method described in Section [3] With 100 Monte Carlo 
simulations, we computed the number of correctly selected models and the averages of the mean square 
errors (AMSE) for the proposed method with FDR and Bonferroni corrections, as well as for group 
LASSO, group SCAD and GFLM. The results in Table [l] suggest that when the sample size is relatively 
small (n = 100), all four methods seem to select the correct model about the same number of times, 
however as the sample size increases, the proposed variable selection procedure outperforms group SCAD, 
group LASSO and the GFLM. We note that restrictive choices of level for the tests tend to yield better 
results of the proposed method, where for example we observe that the choice of q = 0.01 delivers the 
highest number of correctly model selections. For c = 0 or c = 0.8, group SCAD and group LASSO have 
AMSE similar to that of the proposed procedure. However for predictors included in the model with 
low significance (c = 0.4), the AMSE of group SCAD and group LASSO are about double the AMSE 
achieved by our procedure, while the GFLM delivers the highest AMSE in all models. 
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Table 1: Number of correctly selected models and AMSE 






t bg 



'j-’FDR 


SCAD 

LASSO 

GFLM 

c 

n 


.01 

.05 

.1 

.01 

.05 

.1 

GCV 

BIC 

GCV 

BIC 


0 

too 

correct 

88 

79 

65 

87 

74 

58 

82 

82 

80 

83 

77 



AMSE 

(2.07) 

(2.04) 

(2.01) 

(2.06) 

(2.05) 

(1.97) 

(1.45) 

(1.45) 

(1.19) 

(1.30) 

(8.94) 


300 

correct 

96 

92 

88 

95 

89 

83 

85 

85 

84 

86 

83 



AMSE 

(1.93) 

(1.98) 

(1.89) 

(1.92) 

(1.97) 

(1.91) 

(1.31) 

(1.31) 

(1.04) 

(1.16) 

(8.51) 

.4 

100 

correct 

79 

79 

78 

82 

80 

73 

79 

79 

65 

65 

76 



AMSE 

(2.61) 

(2.98) 

(2.77) 

(2.88) 

(3.01) 

(2.82) 

(5.60) 

(5.60) 

(5.67) 

(5.70) 

(11.37) 


300 

correct 

96 

94 

90 

95 

92 

88 

83 

83 

71 

80 

84 



AMSE 

(2.57) 

(2.90) 

(2.74) 

(2.87) 

(2.91) 

(2.79) 

(5.58) 

(5.58) 

(5.64) 

(5.59) 

(10.78) 

.8 

100 

correct 

83 

81 

80 

83 

81 

79 

83 

83 

72 

74 

83 



AMSE 

(7.15) 

(7.96) 

(7.92) 

(7.42) 

(7.87) 

(7.78) 

(7.41) 

(7.41) 

(7.14) 

(7.87) 

(13.49) 


300 

correct 

98 

96 

93 

99 

95 

92 

93 

93 

80 

82 

94 



AMSE 

(7.08) 

(7.10) 

(7.01) 

(7.09) 

(7.11) 

(7.14) 

(7.27) 

(7.27) 

(7.17) 

(7.32) 

(12.05) 


5. Real Data Example: Weather Data 

In this application, we consider weather data observed monthly at 79 weather stations in Japan. 
The data set was obtained from http://www.data.jma.go.jp/obd/stats/data/en/, and includes monthly 
and annual total observations averaged from 1971 to 2000: monthly observed average temperatures 
(TEMP), average atmospheric pressure (PRESS), time of daylight (LIGHT), average humidity (HU¬ 
MID), maximum temperature (MAX.TEMP), minimum temperature (MIN.TEMP) and annual total 
precipitation. The dataset used in this analysis does not correspond to the one used in Matsui and 
Konishi (2011), rather we selected the 79 most reliable stations according to the aforementioned website. 

The functional predictors, observed at a grid of 1 to 12 points, were fitted using 6 B-splines basis 
functions. Figure [T] shows examples of the fitted functional predictors. The goal of this application 
is to select the functional covariates that significantly relate to annual total precipitation. We applied 
the proposed variable selection method and compared the results with those of the group SCAD, group 
LASSO and GFLM selection procedures, using the same number of basis functions. 

[Figure 1 about here] 

Figure 1: Examples of smoothed functional covariates from weather data 

The selected functional predictors for each method are shown in Table [2] Humidity and maximum 
temperature are selected by all methods except GFLM, however, differently from group SCAD and group 
LASSO, the proposed procedure and GFLM selected PRESS and did not select LIGHT. Atmospheric 
pressure is well known among meteorologists to be related to precipitation. Low and high air pressure 
systems are usually caused by unequal heating across the surface of the planet. A low pressure system is 
an area where the atmospheric pressure is lower than that of the area around it. The production of clouds 
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and consequent precipitation are hence related to the wind, warm air and atmospheric lifting caused by 
low pressure systems. 


Table 2: Selected predictors for the weather dataset example 


Method 

Selected 

T l 

PRESS, HUM, MAX.T 

SCAD 

LIGHT, HUM, MAX.T 

LASSO 

TEMP, LIGHT, HUM, MAX.T 

GFLM 

TEMP, PRESS, LIGHT 


In a simulation of 100 bootstrap samples from the weather data, we performed variable selection 
using the proposed method, group SCAD and group LASSO and GFLM. Table [3] shows the number of 
times each predictor was selected. While LIGHT was the third most selected predictor by group SCAD 
and group LASSO (about 70% of the time) and the most selected by GFLM, it was only the fourth 
most selected predictor when using the proposed procedure. On the other hand, pressure was selected 
most frequently by the proposed method, followed by humidity and maximum temperature. Our results 
meet the expectations of most specialized meteorology literature, which finds significant relation between 
pressure, humidity and maximum temperature with annual precipitation. 


Table 3: Ratio of selection on 100 bootstrap samples of weather data 


Method 

TEMP 

PRESS 

LIGHT 

HUM 

MAX.T 

MIN.T 

T l {BC ) 

0.38 

0.90 

0.56 

0.89 

0.87 

0.41 

T l (FDR) 

0.40 

0.90 

0.58 

0.87 

0.86 

0.45 

SCAD (GCV) 

0.37 

0.23 

0.65 

0.81 

0.81 

0.24 

SCAD (BIC) 

0.37 

0.21 

0.75 

0.81 

0.83 

0.23 

LASSO (GCV) 

0.45 

0.35 

0.62 

0.78 

0.80 

0.25 

LASSO (BIC) 

0.45 

0.34 

0.75 

0.81 

0.81 

0.23 

GLM 

0.73 

0.67 

0.79 

0.47 

0.47 

0.21 
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Appendix 

Proof of Lemma \3.2I 

Since (P — Po) is idempotent, it is easy to show that the non-centrality parameter <5 is equal to 

S = b T Z T (P - P 0 )Zb /a 2 = ||Zb - P Q Zb||7<7 2 . 
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Note that E(Y\Z) = Zb is the vector of expected values conditional on Z, which belongs to the subspace 
Q, and PoZb is its projection onto the restricted subspace Without loss of generality write Zb = 
(Z°, Z 1 )(b_ r , b r ), where Z 1 is the sub-matrix of Z with columns corresponding to the parameters b r , and 
Z° the remaining columns (similarly for b_ r ). Let Y = Zb so that (P —Po)Zb = Y —PoY = (I — Po)Y. 
The quantity (I — Po)Y is the residuals from the projection of Y onto the subspace (. This can be 
viewed as a linear model Y = P(Y|Z°) + i, so that the mean squared error ||(I — P 0 )Y|| 2 /(n — k 0 ) = 
Y T (I — Po)Y /(n — ho) = 5a 2 /(n — kg) will converge to the constant. This implies that 6 ~ c(n — ko). 

Proof of Lemma \3.S\ 

Part (a) Let ’F Pt .(.) be the cumulative distribution function (c.d.f.) of the central Xp r distribution and 
'bpr (•) its inverse. Also, denote the residual sum of squares under hypothesis Hq in (j4j) by RSSq. Using 
the fact that lim-n^oo P{A n ) = 1 (Lemma A.l in Bunea et al., 2006), we obtain lim ra _ ! . 00 P(|er 2 — cr 2 1 > 
tra) = 0 for a = \Jlog{n)/n. For all r ^ /q, b r = 0, and for any 0 < 7 < 1 we find that 


P (K < 7 } n A n ) = P ({l - % r (T r L ) < 7 } n A n ) = p ({t£ > ^-/(l - 7 )} n A n ) 


= p 


f RSSk - RSS 


V 


>r‘(i- 7 ) nij <p 


( RSSn - RSS 




> (i - -7)) = 7 + 0{a). 


Part (b) Let a = \Jlogin)/n. For all 0 < 7 < 1, 


p (K > 7 } n A n ) = p ({i - % r (TD > 7 } n A„) = p ({T£ < *-*(1 - 7 )} n A n ) 


= p 


f RSS£ - RSS 

l a 2 


< ^( 1 - 7 ) nA. 


^ p ^ RSSS-RSS ; 


(f+ 1 ) V( 1 -7) 


Under the alternative ( RSSq — RSS) /cr 2 has a non-central chi-square distribution with p r degrees of 
freedom and non-centrality parameter 5, whose c.d.f. we denote by 'F Prj , 5 (.). Since S ~ c(n — ko) and 
k < y/n/log{n), we conservatively have <5 ~ c(n— y/n/log(n)). For 7 > 1/n, as n — > 00 and hence S —> 00 , 
we have that 


& 


= Z^-j e T *Pr+2j(* Pr 1 ( 1 -7)) 


7 =0 




Pr/ 2 +i-l 


= E 


7=0 


£=0 


(^(1-7))' 

2 e j\ 


= 0(7), 


since the poisson weights are dislocated to larger values of j at a rate of exp(n — y/n/log(n)) while the 
values of ’F Pr + 2 .j( , I'p r 1 (l — 7 )) are dislocated at a rate slower than n, for the choice of 7 (Note that even 
if 7 was chosen to decrease at a slower rate than exp(— n)n k , the percentile V I / P ) 1 (1 — 7 ) would increase 
slower than a linear rate in n, and \f , J , ri i(T Jv 1 (l — 7 )) would be o(l)). Hence P ({77 > 7 } D A n ) < 
^Pr, 5 ((f + !) - 7 )) = 0(7) + 0(a). □ 


Proof of Lemma \3.f\ 

Since limn^oo P(A n ) = limn^oo P (|cr — cr| < a) = 1, where a = yjlog(n)/n , it suffices to show that 
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lim „_ ) . 00 P(r° D 7t n ) = 0. From Lemma [3721 S is of order ~ cn, so that for 7 = a 

p{r c n nA n ) < E p (l 7rfc < T -»}nA.) 

mG/o 

^ E E [-P ({TTfc < 7} n A n ) + P ({7 T m > 7} n A n )] 

mG/o 

EE [7 + 0(a) + 0 ( 7 )] = M 0 (M — Mo) [7 + 0(a) + 0 ( 7 )], 

mG/o k£lo 

where the last inequality follows from Lemma I ,'1.3 1 Since 7 = a we have lim P (T£ D A n ) = 0. □ 

n —>00 

Proof of Theorem \3.5\ 

We follow the proof in Bunea et. al. (2006) to prove the theorem under FDR corrections. The case of 
Bonferroni corrections follows with similar steps. If I is equal to Iq, we have Mq rejections (R = Mo) 
with none of them being erroneous (V = 0). Thus, the consistency of I is verified by showing that 

P(I = I 0 ) = P{R = Mo, V = 0 ) ->• 1 , as n 00 . ( 10 ) 


This follows by showing that both P(R ^ Mo) and P(V > 1) are asymptotically negligible. We have 
that (Bunea et al. 2006, Lemma 2.1) 


P{V > 1) < P(R ± M 0 ) + 


M 0 {M- Mo) n 
M Q 


( 11 ) 


Hence, in order to show consistency of I we need only show that P(R ^ Mo) —> 0. Let qm = 
Q/YaL i^ _1 an d note that {R ^ M 0 } = U {7r(m) < qM'm/M} U {n(Mo) > qmMo/M} , so that 

P(R^Mo) < P(A c n ) + P(Y C n A n ) + p (^t {Mo) > !tf ^Jnr n n4 ! j 

M 

+ E ^(fw<9M^}nr„nA B ), (12) 

m=Mo + l 


where A n = {|cr — a\ < a }, with a = ylogfri)/n. and T n is the event defined in Lemma 13.41 The third 
term on the right hand side of m is equal to 

p Q 7 r (Mo) > | nr„n A„^j < Mo max P ^| 7 r m > 

= O (^Mo + a )) = °( 1 )= as n -> °o, 

by Lemma l3.3l and the assumptions of the theorem. For the last term in (1121) we have 
M M 

E p ({ T w<wJ}nFnn4 B )< E p E qm } n r n n .A n ) 

m=Mo~\-l m=Mo + l 

< ^2 P - 3m} n An) = o (^{M - M 0 ) + a ) ) = 0 ( 1 )’ as n °°> 

by Lemma [3~3l and the assumptions of the theorem. This shows that P({P ^ Mo}) —> 0. Following (fill) 
with the choice of q, we can to conclude that I is consistent, i.e., lim P(7 = Iq) = 1. □ 

n—> oo 

References 



11 






References 


[I] Abramovich, F., Benjamini, Y., Donoho, D.L., and Johnstone,I.M. (2006). Adapting to unknown 
sparsity by controlling the false discovery rate. The Annals of Statistics, 34, 584-653. 

[3] Aneiros, G., Ferraty F. and Vieu, P. (2011) Variable Selection in Semi-Functional Regression Models. 
Recent Advances in Functional Data Analysis and Related Topics-Contributions to Statistics 57, 17-22. 

[3] Aneiros, G. and Vieu, P. (2013). Testing linearity in semi-parametric functional data analysis. Com¬ 
putational Statistics, 28, 413-434. 

[4] Aneiros, G., Vieu, P. (2014). Variable selection in infinite-dimensional problems. Statistics and Prob¬ 
ability Letters, 94, 12-20. 

[5] Aneiros, G., Vieu, P. (2015). Partial linear modeling with multi-functional covariates. Computational 
Statistics, Online ISSN: 1613-9658. 

[6] Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing 
under dependency. Annals of Statististics, 29, 1165-1188. 

[7] Bongiorno, E. G., Salinelli, E., Goia, A. and Vieu, P. (2014). Contributions in infinite-dimensional 
statistics and related topics. Societd Editrice Esculapio. 

[8] Bunea, F., Wegkamp, M., and Auguste, A. (2006). Consistent variable selection in high dimensional 
regression via multiple testing. Journal of Statistical Planning and Inference 136, 4349-4364. 

[9] Cardot, H., Goia, A., and Sarda, P. (2004). Testing for No Effect in Functional Linear Regression 
Models, Some Computational Approaches. Com. in Stat. - Simul. and Computation, 33, 179-199. 

[10] Cuevas, A. (2014). A partial overview of the theory of statistics with functional data. J. of Statistical 
Planning and Inference, 147, 1-23. 

[II] Fan, J., and Li, R. (2004). New Estimation and Model Selection Procedures for Semiparametric 
Modeling in Longitudinal Data Analysis. JASA, 99, 710-723. 

[12] Ferraty, F. and Vieu, P. (2006). Nonparametric Functional Data Analysis, Theory and Practice. 
Springer Series in Statistics. 

[13] F. Ferraty and P. Vieu (2009). Additive prediction and boosting for functional data. Computational 
Statistics & Data Analysis, 53, 1400-1413. 

[14] Ferraty, F., Laksaci, A., Tadj, A. and Vieu, P. (2010). Rate of uniform consistency for nonparametric 
estimates with functional variables. J. Statistical Planning Inference, 140, 335-352. 


12 


[15] Gertheiss, J., Maity, A., and Staicu, A.M. (2013). Variable Selection in Generalized Functional Linear 
Models. Stat, 2, 86-101. 

[16] Goia, A. and Vieu, P. (2014). A partitioned Single Functional Index Model. Computational Statistics, 
Online ISSN 1613-9658. 

[17] Horvath, L. and Kokoszka, P. (2012) Inference for Functional Data with Applications. Springer 
Series in Statistics. 

[18] Hong, Z. and Lian, H. (2011). Inference of genetic networks from time course expression data using 
functional regression with lasso penalty. Commun. in Statistics - Theory and Methods, 40, 1768-1779. 

[19] James, G. M. (2002). Generalized linear models with functional predictors. JRSS-B, 64, 411-432. 

[20] James, G., Wang, J. and Zhu, J. (2009). Functional linear regression that’s interpretable. Ann. 
Statist., 37, 2083-2108. 

[21] Kayano, M., and Konishi, S. (2009). Functional principal component analysis via regularized Gaus¬ 
sian basis expansions and its application to unbalanced data. JSPI, 139, 2388-2398. 

[22] Kong, D., Staicu, A.M., and Maity, A. (2013). Classical testing in functional linear models. North 
Carolina State University, Dept, of Statistics, Technical Reports 2647, 1-23. 

[23] Ma, S., Song, Q. and Wang, L. (2013). Simultaneous variable selection and estimation in semipara- 
metric modeling of longitudinal/clustered data. Bernoulli, 19, 252-274. 

[24] Matsui, H., and Konishi, K. (2011). Variable selection for functional regression models via the L\ 
regularization. Computational Statistics and Data Analysis 55, 3304-3310. 

[25] McLean, M.W., Hooker, G., and Ruppert, D. (2014). Restricted likelihood ratio tests for linearity 
in scalar-on-function regression. Statistics and Computing, DOI: 10.1007/sll222-014-9473-l. 

[26] Meinshausena, N., Meiera, L. and Buhlmanna, P. (2009). p-Values for High-Dimensional Regression. 
JASA, 104, 1671-1681. 

[27] Mingotti, N., Lillo, R. E., and Romo, J. (2013). Lasso variable selection in functional regression. 
Statistics and Econometrics Series 13, Working paper 13-14. 

[28] Pomann, G.M., Staicu, A.M., and Ghosh, S. (2014). Two Sample Hypothesis Testing for Functional 
Data. North Carolina State University, Dept, of Statistics, Preprint Submitted. 

[29] Ramsay, J.O., and Silverman, B. W. (2005). Functional Data Analysis , 2nd ed. Springer, New York. 

[30] Reif, U. (1997). Orthogonality of cardinal B-splines in weighted Sobolev spaces. SIAM J. Math. 
Anal., 28, 1258-1263. 


13 


[31] Rencher, A. C. and Schaalje, G. B. (2008). Linear Models in Statistics, 2nd ed. Wiley, New Jersey. 

[32] Swihart, B.J., Goldsmith, J., and Crainiceanu, C.M. (2013). Restricted likelihood ratio tests for 
functional effects in the functional linear model. Technometrics. DOI: 10.1080/00401706.2013.863163. 

[33] Yang, X., and Nie, K. (2008). Hypothesis testing in functional linear regression models with Neyman’s 
truncation and wavelet thresholding for longitudinal data. Statistics in Medicine , 27, 845-863. 

[34] Zambom, A.Z., and Akritas, M.G. (2014). Nonparametric lack-of-fit testing and consistent variable 
selection. Statistica Sinica 24, 1837-1858. 


14 


